AITopics | neural voice cloning

Collaborating Authors

neural voice cloning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Neural Voice Cloning with a Few Samples

Neural Information Processing SystemsNov-20-2025, 22:07:54 GMT

Voice cloning is a highly desired feature for personalized speech interfaces. We introduce a neural voice cloning system that learns to synthesize a person's voice from only a few audio samples. We study two approaches: speaker adaptation and speaker encoding. Speaker adaptation is based on fine-tuning a multi-speaker generative model. Speaker encoding is based on training a separate model to directly infer a new speaker embedding, which will be applied to a multi-speaker generative model. In terms of naturalness of the speech and similarity to the original speaker, both approaches can achieve good performance, even with a few cloning audios. While speaker adaptation can achieve slightly better naturalness and similarity, cloning time and required memory for the speaker encoding approach are significantly less, making it more favorable for low-resource deployment.

multi-speaker generative model, name change, neural voice cloning, (3 more...)

Neural Information Processing Systems

Industry: Information Technology > Security & Privacy (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

Add feedback

Reviews: Neural Voice Cloning with a Few Samples

Neural Information Processing SystemsOct-7-2024, 09:10:03 GMT

This paper investigates cloning voices using limited speech data. To that end, two techniques are studied: speaker adaptation approach and speaker encoding approach. Extensive experiments have been carried out to show the performance of voice cloning and also analysis is conducted on speaker embedding vectors. The synthesized samples sounds OK, although not in very high quality given only a few audio samples. Below are my details comments.

audio sample, neural voice cloning, speaker adaptation, (5 more...)

Neural Information Processing Systems

Industry: Information Technology > Security & Privacy (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Neural Voice Cloning with a Few Samples

Arik, Sercan, Chen, Jitong, Peng, Kainan, Ping, Wei, Zhou, Yanqi

Neural Information Processing SystemsFeb-14-2020, 20:57:19 GMT

multi-speaker generative model, neural voice cloning, speaker adaptation, (1 more...)

Neural Information Processing Systems

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Neural Voice Cloning with a Few Samples - Baidu Research

#artificialintelligenceMar-7-2018, 09:30:37 GMT

Speaker encoding is based on training a separate model to directly infer a new speaker embedding from cloning audios that will ultimately be used with a multi-speaker generative model. The speaker encoding model has time-and-frequency-domain processing blocks to retrieve speaker identity information from each audio sample, and attention blocks to combine them in an optimal way. The advantages of speaker encoding include fast cloning time (only a few seconds) and low number of parameters to represent each speaker, making it favorable for low-resource deployment.Besides accurately estimating the speaker embeddings, we observe that speaker encoders learn to map different speakers to embedding space in a meaningful way. For example, different genders or accents from various regions are clustered together. This was created by applying operations in this learned latent space, to convert the gender or region of accent of one speaker.

artificial intelligence, machine learning, neural voice cloning, (1 more...)

#artificialintelligence

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Neural Voice Cloning: Teaching Machines to Generate Speech

#artificialintelligenceMar-6-2018, 14:16:20 GMT

At Baidu Research, we aim to revolutionize human-machine interfaces with the latest artificial intelligence techniques. Our Deep Voice project was started a year ago, which focuses on teaching machines to generate speech from text that sounds more human-like. Beyond single-speaker speech synthesis, we demonstrated that a single system could learn to reproduce thousands of speaker identities, with less than half an hour of training data for each speaker. This capability was enabled by learning shared and discriminative information from speakers. We were motivated to push this idea even further, and attempted to learn speaker characteristics from only a few utterances (i.e., sentences of few seconds duration).

artificial intelligence, machine learning, neural voice cloning, (3 more...)

#artificialintelligence

Industry: Information Technology > Security & Privacy (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.81)

Add feedback